CHECK THE COUNTRIES

The point of this document was to inspect the country variable. When asked where they were from, people had an open space to type in the country. Many people seemed to have skipped this question, so we wanted to see if there was missing data, or simply, if the country they entered was not translated or converted into the right code.

The data used for this is from global survey, wave 9.

Variable “country_iso”

The plot is only showing countries with at least 500 participants across all waves. By hover over the barplot you can inspect the exact N of participants by county.

We observe that 12,000+ participants did not report their country.

Let’s see what’s up

I wanted to inspect all variables with language and country information to try to understand the nature of missigness.

global_raw %>%
  select(startlanguage, contains("country")) %>%
  slice(1:7, 30:40) %>%
  kable() %>%
  kable_styling("hover")
startlanguage country country_trl country_iso country_trans
fr Canada Canada Canada
fr Canada Canada Canada
en Canada Canada Canada
es Colombia Colombia Colombia
fr Canada Canada Canada
en
en Canada Canada Canada
en Canada Canada Canada
pt-BR Brasil Brazil Brazil
fr Canada Canada Canada
zh-Hant-TW 台灣 Taiwan Taiwan Province of China
en Canada Canada Canada
tr Ankara Kurkiye Ankara Ankara
en
en
fr France France France
fr Canada Canada Canada
he ישראל Israel Israel

Missing info in variable “Country”

N = 12,282

Among those who did not report their country, I wanted to inspect the start language. French and English seem to account for 60% which might suggest most responses have been collected in Canada. Not surprising, given the recruitment strategy.

startlanguage n prop
1 0%
ar 155 1%
da 15 0%
de 176 1%
el 28 0%
en 3625 30%
es 989 8%
fa 50 0%
fr 3509 29%
he 324 3%
hi 13 0%
hr 8 0%
id 54 0%
it 984 8%
ja 101 1%
ko 32 0%
lt 19 0%
mr 11 0%
ms 246 2%
nl 32 0%
pt 34 0%
pt-BR 700 6%
ro 28 0%
ru 193 2%
sk 75 1%
sq 39 0%
sr-Latn 78 1%
sv 18 0%
swh 13 0%
tl 33 0%
tr 183 1%
uk 2 0%
vi 7 0%
zh-Hans 203 2%
zh-Hant-TW 304 2%

MISSING COUNTRY BY WAVE

wave n prop
1 8080 66%
2 1455 12%
3 1167 10%
4 420 3%
5 444 4%
6 268 2%
7 154 1%
8 204 2%
9 90 1%
## # A tibble: 0 × 413
## # … with 413 variables: rowid <int>, id <chr>, submitdate <dttm>,
## #   lastpage <chr>, startlanguage <chr>, seed <chr>, startdate <dttm>,
## #   datestamp <dttm>, refurl <chr>, lang <chr>, status <dbl>, country <chr>,
## #   sex <dbl>, age <chr>, edu <dbl>, cempstat_sq001 <dbl>,
## #   cempstat_sq002 <dbl>, cempstat_sq003 <dbl>, cempstat_sq004 <dbl>,
## #   cempstat_sq005 <dbl>, cempstat_sq006 <dbl>, cempstat_sq007 <dbl>,
## #   cempstat_sq008 <dbl>, emplstat_sq001 <dbl>, emplstat_sq002 <dbl>, …

This document was prepared by UK - reach me with any questions/comments!